Goto

Collaborating Authors

 Shenzhen


Humanoid workers and surveillance buggies: 'embodied AI' is reshaping daily life in China

The Guardian

On a misty Saturday afternoon in Shenzhen's Central Park, a gaggle of teenage girls are sheltering from the drizzle under a concrete canopy. With their bags of crisps piled high in front of them, they crowd around a couple of smartphones to sing along to Mandopop ballads. The sound of their laughter rings out across the surrounding lawn โ€“ until it is pierced by a mechanical buzzing sound. A few metres away from the impromptu karaoke session is an "airdrop cabinet", one of more than 40 in Shenzhen that is operated by Meituan, China's biggest food delivery platform. Hungry park-goers can order anything from rice noodles to Subway sandwiches to bubble tea.


U.K. raises alarm on Chinese drones used to survey sensitive sites

The Japan Times

U.K. government officials have raised private concerns that Chinese-manufactured drones are being used to take high resolution images of critical national infrastructure sites in the U.K., going against guidance from the country's security services. National Grid PLC, which operates the nation's electricity and gas networks, uses drones made by Shenzhen-based SZ DJI Technology to take videos, photographs and thermal images of its electricity substations, according to information posted on its website as recently as September. DJI drones have also been used to survey the construction of Electricite de France SA's Hinkley Point C nuclear power plant, to inspect solar farms, and by Thames Water to monitor reservoirs and the water supply. Deployment of the drones comes despite a warning in 2023 by the U.K.'s National Protective Security Authority (NPSA), part of the domestic security service MI5, that British organizations managing sensitive sites should be wary of using drones "manufactured in countries with coercive data sharing practices," a reference to China. Moreover, in 2022, the U.S. Department of Defense included DJI on a blacklist of Chinese firms with military ties.


Reconstruct and Match: Out-of-Distribution Robustness via Topological Homogeneity Chaoqi Chen 1 Luyao Tang 2 Hui Huang College of Computer Science and Software Engineering, Shenzhen University

Neural Information Processing Systems

Since deep learning models are usually deployed in non-stationary environments, it is imperative to improve their robustness to out-of-distribution (OOD) data. A common approach to mitigate distribution shift is to regularize internal representations or predictors learned from in-distribution (ID) data to be domain invariant. Past studies have primarily learned pairwise invariances, ignoring the intrinsic structure and high-order dependencies of the data. Unlike machines, humans recognize objects by first dividing them into major components and then identifying the topological relation of these components. Motivated by this, we propose Reconstruct and Match (REMA), a general learning framework for object recognition tasks to endow deep models with the capability of capturing the topological homogeneity of objects without human prior knowledge or fine-grained annotations. To identify major components from objects, REMA introduces a selective slotbased reconstruction module to dynamically map dense pixels into a sparse and discrete set of slot vectors in an unsupervised manner. Then, to model high-order dependencies among these components, we propose a hypergraph-based relational reasoning module that models the intricate relations of nodes (slots) with structural constraints. Experiments on standard benchmarks show that REMA outperforms state-of-the-art methods in OOD generalization and test-time adaptation settings.


A Mathematical Framework for Quantifying Transferability in Multi-source Transfer Learning Tsinghua-Berkeley Shenzhen Institute Massachusetts Institute of Technology Tsinghua University

Neural Information Processing Systems

Current transfer learning algorithm designs mainly focus on the similarities between source and target tasks, while the impacts of the sample sizes of these tasks are often not sufficiently addressed. This paper proposes a mathematical framework for quantifying the transferability in multi-source transfer learning problems, with both the task similarities and the sample complexity of learning models taken into account. In particular, we consider the setup where the models learned from different tasks are linearly combined for learning the target task, and use the optimal combining coefficients to measure the transferability. Then, we demonstrate the analytical expression of this transferability measure, characterized by the sample sizes, model complexity, and the similarities between source and target tasks, which provides fundamental insights of the knowledge transferring mechanism and the guidance for algorithm designs. Furthermore, we apply our analyses for practical learning tasks, and establish a quantifiable transferability measure by exploiting a parameterized model. In addition, we develop an alternating iterative algorithm to implement our theoretical results for training deep neural networks in multi-source transfer learning tasks. Finally, experiments on image classification tasks show that our approach outperforms existing transfer learning algorithms in multi-source and few-shot scenarios.


Towards Robust Multimodal Sentiment Analysis with Incomplete Data School of Data Science, The Chinese University of Hong Kong, Shenzhen

Neural Information Processing Systems

The field of Multimodal Sentiment Analysis (MSA) has recently witnessed an emerging direction seeking to tackle the issue of data incompleteness. Recognizing that the language modality typically contains dense sentiment information, we consider it as the dominant modality and present an innovative Languagedominated Noise-resistant Learning Network (LNLN) to achieve robust MSA. The proposed LNLN features a dominant modality correction (DMC) module and dominant modality based multimodal learning (DMML) module, which enhances the model's robustness across various noise scenarios by ensuring the quality of dominant modality representations. Aside from the methodical design, we perform comprehensive experiments under random data missing scenarios, utilizing diverse and meaningful settings on several popular datasets (e.g., MOSI, MOSEI, and SIMS), providing additional uniformity, transparency, and fairness compared to existing evaluations in the literature. Empirically, LNLN consistently outperforms existing baselines, demonstrating superior performance across these challenging and extensive evaluation metrics.


Graph Convolutional Kernel Machine versus Graph Convolutional Networks Zhihao Wu1 Shenzhen Research Institute of Big Data, Shenzhen, China

Neural Information Processing Systems

Graph convolutional networks (GCN) with one or two hidden layers have been widely used in handling graph data that are prevalent in various disciplines. Many studies showed that the gain of making GCNs deeper is tiny or even negative. This implies that the complexity of graph data is often limited and shallow models are often sufficient to extract expressive features for various tasks such as node classification.


Text-Derived Relational Graph-Enhanced Network for Skeleton-Based Action Segmentation

arXiv.org Artificial Intelligence

--Skeleton-based T emporal Action Segmentation (ST AS) aims to segment and recognize various actions from long, untrimmed sequences of human skeletal movements. Current ST AS methods typically employ spatio-temporal modeling to establish dependencies among joints as well as frames, and utilize one-hot encoding with cross-entropy loss for frame-wise classification supervision. However, these methods overlook the intrinsic correlations among joints and actions within skeletal features, leading to a limited understanding of human movements. T o address this, we propose a T ext-Derived Relational Graph-Enhanced Network (TRG-Net) that leverages prior graphs generated by Large Language Models (LLM) to enhance both modeling and supervision. For modeling, the Dynamic Spatio-T emporal Fusion Modeling (DSFM) method incorporates T ext-Derived Joint Graphs (TJG) with channel-and frame-level dynamic adaptation to effectively model spatial relations, while integrating spatio-temporal core features during temporal modeling. For supervision, the Absolute-Relative Inter-Class Supervision (ARIS) method employs contrastive learning between action features and text embeddings to regularize the absolute class distributions, and utilizes T ext-Derived Action Graphs (T AG) to capture the relative inter-class relationships among action features. Additionally, we propose a Spatial-A ware Enhancement Processing (SAEP) method, which incorporates random joint occlusion and axial rotation to enhance spatial generalization. Performance evaluations on four public datasets demonstrate that TRG-Net achieves state-of-the-art results. EMPORAL Action Segmentation (T AS), an advanced task in video understanding, aims to segment and recognize each action within long, untrimmed video sequences of human activities [1]. Similar to how semantic segmentation predicts labels for each pixel in an image, T AS predicts action labels for each frame in a video. As a significant task in computer vision, T AS finds applications in various domains such as medical rehabilitation, [2], industrial monitoring [3], and activity analysis [4]. Haoyu Ji, Bowen Chen, Weihong Ren, Wenze Huang, Zhihao Y ang, Zhiyong Wang, and Honghai Liu are with the State Key Laboratory of Robotics and Systems, Harbin Institute of Technology Shenzhen, Shenzhen 518055, China (e-mail: jihaoyu1224@gmail.com, The code is available at https://github.com/HaoyuJi/TRG-Net. The text embeddings and relational graphs generated by large language models can serve as priors for enhancing modeling and supervision of action segmentation. Specifically, the text-derived joint graph effectively captures spatial correlations, while the text-derived action graph and action embeddings supervise the relationships and distributions of action classes. Existing T AS methods can be broadly categorized into two types based on input modality: Video-based T AS (VT AS) and Skeleton-based T AS (ST AS) [5]-[7].


Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective - Supplementary Material - Shenzhen International Graduate School, Tsinghua University

Neural Information Processing Systems

A.1 Basic setting Setting 1 Without loss of generality, we suppose that the long-tailed distribution satisfies some kind exponential distribution with parameter [8]. Proof A.1 Follow the Basic Setting 1, when mixing factor Beta(,), consider a -Aug sample generated by ex Therefore, the head gets more regulation than the tail. One the one hand, the classification performance will be promoted. On the other hand, however, the performance gap between the head and tail still exists. Hence we can generalize Eq.A.9 to: Z y According to Eq.A.11, it's easy to find a derivative zero point in range [1,C]. UniMix Factor (green) alleviates such situation and the full pipeline ( = 1) constructs a more uniform distribution of -Aug (red), which contributes to a well-calibrated model. As illustrated in Fig.A1, when the class number C and imbalance factor get larger, the limitations of mixup in LT scenarios gradually appear. It has limited contribution for the tail class' feature learning and regulation, which is the reason for its poor calibration.


Deep Reinforcement Learning-Based Bidding Strategies for Prosumers Trading in Double Auction-Based Transactive Energy Market

arXiv.org Artificial Intelligence

--With the large number of prosumers deploying distributed energy resources (DERs), integrating these prosumers into a transactive energy market (TEM) is a trend for the future smart grid. A community-based double auction market is considered a promising TEM that can encourage prosumers to participate and maximize social welfare. However, the traditional TEM is challenging to model explicitly due to the random bidding behavior of prosumers and uncertainties caused by the energy operation of DERs. Furthermore, although reinforcement learning algorithms provide a model-free solution to optimize prosumers' bidding strategies, their use in TEM is still challenging due to their scalability, stability, and privacy protection limitations. T o address the above challenges, in this study, we design a double auction-based TEM with multiple DERs-equipped prosumers to transparently and efficiently manage energy transactions. We also propose a deep reinforcement learning (DRL) model with distributed learning and execution to ensure the scalability and privacy of the market environment. Simulation results show that (1) the designed TEM and DRL model are robust; (2) the proposed DRL model effectively balances the energy payment and comfort satisfaction for prosumers and outperforms the state-of-the-art methods in optimizing the bidding strategies. ITH the extensive deployment of energy storage systems, solar photovoltaics (PVs), smart home appliances, and information technology, passive consumers in the traditional electricity market are gradually converted to active prosumers (producers + consumers) with distributed energy resources (DERs), who can monitor and control energy generation, consumption, storage, and transaction to achieve specific goals, such as balancing energy costs and user comfort levels [1]-[3]. However, the bi-directional energy and information flow, as well as the variability of distributed renewable energy, raises great challenges in the operation of power systems in a flexible and economically efficient way [4]. Liu are with the Department of Computer Science and Engineering, Santa Clara University, Santa Clara, CA, USA (e-mail: jun3525114@gmail.com, Li, M. Ghafouri, and J. Y an are with Concordia Institute for Information Systems Engineering, Concordia University, Montreal, QC, Canada (e-mail: {yuanliang.li, L. Hou is with Beijing University of Posts and Telecommunications, Beijing, China (e-mail: luyang.hou@bupt.edu.cn) Zhang is with the College of Information Engineering, Shenzhen University, Shenzhen, China (e-mail: zhangp@szu.edu.cn)


China's EV giants are betting big on humanoid robots

MIT Technology Review

According to statistics from Shenzhen New Strategy Media's Industrial Research Institute, there were over 160 humanoid-robot manufacturers worldwide as of June 2024, of which more than 60 were in China, more than 30 in the United States, and about 40 in Europe. In addition to having the largest number of manufacturers, China stands out for the way its EV sector is backing most of these robotics companies. Thanks in part to substantial government subsidies and concerted efforts from the tech sector, China has emerged as the world's largest EV market and manufacturer. In 2024, 54% of cars sold in China were electric or hybrid, compared with 8% in the US. China also became the first nation to reach an annual production of 10 million "new energy vehicles" (NEVs), a category that includes all vehicles powered partly or entirely by electricity.